The AR (Auto-Regression) model of order () is given by for . In matrix notation, , where
is regressed on its own lagged values .
For predicting , plug into (1.1):
Note that are all observed and they are the last observations. Then for ,
Here is not observed, but we can replace with predicted value . And recursively, predict value for any time point.
We can see AR as simply regression of the observed time series on lagged versions of itself.
2 Relation to MLE
Start with :
2.1 Usual Regression
(2) looks just like For this, given independence,
Now replace by :
To sum up, here are the assumptions we made:
Independence of .
Model equation (linearity)
Independence of
Density of independent of .
2.2 Back to
When we try to apply (3) to (2), these assumptions don't hold:
We can't get from equation. There are two approaches to get it.
First, we simply assume does not depend on . So
It's easy to verify that the estimates are identical to those obtained in (2.1). We call this Conditional MLE.
Second, we extend the model to . So
If , coefficient is very small, so
Since , and
Thus when , so so finally
Now compare (4) and (5). (5) is referred to as full likelihood for and therefore full MLE. (4) and (5) will be quite close when and is large.
2.3
The model is given by
The likelihood is
The conditional likelihood is
Here we assume .
To obtain the parameter estimates, we can directly maximize the likelihood. And since does not depend on , it is equivalent to maximizing the conditional likelihood.
2.3.1 Bayesian Approach
However, if we want to derive in a more principled way, we have to use (1.1) for smaller values of but it's complicated and not really worth it. We can also work under some "stationarity" assumptions on (much simpler than conditional likelihood): use matrix notation (see here), and Assume , then by here, where . If inference for is desired, we can use
Bayesian inference for AR is identical to linear regression models because of the same likelihood. Bayesian inference only cares about the likelihood.
Frequentist inference is based on MLE, given by and . The analysis is quite different from linear regression. The results are slightly different but close.
3 Predictions and Difference Equations
Given a fitted model with , predictions for are obtained by: where the recursion is initialized with
Or we can rewrite to (Initialized by ). (3.2) is called a difference equation of order .
3.1 First Order ()
Now (3.2) becomes along with initialized . Convert to a homogeneous equation (with no intercept term) by taking :
So
: converges exponentially to ;
: explodes to infinity exponentially;
: ;
: oscillates between and .
3.2 General Case: Bayesian Approach
In the Bayesian context, prediction is done via joint probability distribution of conditional on . Now consider conditional expectations:
First calculate for fixed :
If we initialize this with , , then (3.4) can be evaluated in sequence for .
Now (3.3) becomes
We can do one of two things to compute:
Generate posterior samples from , then
Use the fact that is usually highly concentrated around . We then ignore the small uncertainty of around :